智能论文笔记

Feature Representation Learning for Robust Retinal Disease Detection from Optical Coherence Tomography Images

Sharif Amit Kamran , Khondker Fariha Hossain , Alireza Tavakkoli , Stewart Lee Zuckerbrod , Salah A. Baker

分类：计算机视觉

2022-06-24

眼科图像可能包含相同的外观病理，这些病理可能导致自动化技术的失败以区分不同的视网膜退行性疾病。此外，依赖大型注释数据集和缺乏知识蒸馏可以限制基于ML的临床支持系统在现实环境中的部署。为了提高知识的鲁棒性和可传递性，需要一个增强的特征学习模块才能从视网膜子空间中提取有意义的空间表示。这样的模块（如果有效使用）可以检测到独特的疾病特征并区分这种视网膜退行性病理的严重程度。在这项工作中，我们提出了一个具有三个学习头的健壮疾病检测结构，i）是视网膜疾病分类的监督编码器，ii）一种无监督的解码器，用于重建疾病特异性的空间信息，iiii iii）一个新的表示模块，用于学习模块了解编码器折叠功能和增强模型的准确性之间的相似性。我们对两个公开可用的OCT数据集的实验结果表明，该模型在准确性，可解释性和鲁棒性方面优于现有的最新模型，用于分布视网膜外疾病检测。

translated by 谷歌翻译

Physics-informed Neural Networks with Periodic Activation Functions for Solute Transport in Heterogeneous Porous Media

Salah A Faroughi , Pingki Datta , Seyed Kourosh Mahjour , Shirko Faroughi

分类：机器学习

2022-12-17

Solute transport in porous media is relevant to a wide range of applications in hydrogeology, geothermal energy, underground CO2 storage, and a variety of chemical engineering systems. Due to the complexity of solute transport in heterogeneous porous media, traditional solvers require high resolution meshing and are therefore expensive computationally. This study explores the application of a mesh-free method based on deep learning to accelerate the simulation of solute transport. We employ Physics-informed Neural Networks (PiNN) to solve solute transport problems in homogeneous and heterogeneous porous media governed by the advection-dispersion equation. Unlike traditional neural networks that learn from large training datasets, PiNNs only leverage the strong form mathematical models to simultaneously solve for multiple dependent or independent field variables (e.g., pressure and solute concentration fields). In this study, we construct PiNN using a periodic activation function to better represent the complex physical signals (i.e., pressure) and their derivatives (i.e., velocity). Several case studies are designed with the intention of investigating the proposed PiNN's capability to handle different degrees of complexity. A manual hyperparameter tuning method is used to find the best PiNN architecture for each test case. Point-wise error and mean square error (MSE) measures are employed to assess the performance of PiNNs' predictions against the ground truth solutions obtained analytically or numerically using the finite element method. Our findings show that the predictions of PiNN are in good agreement with the ground truth solutions while reducing computational complexity and cost by, at least, three orders of magnitude.

translated by 谷歌翻译

Statistical Design and Analysis for Robust Machine Learning: A Case Study from COVID-19

Davide Pigoli , Kieran Baker , Jobie Budd , Lorraine Butler , Harry Coppock , Sabrina Egglestone , Steven G. Gilmour , Chris Holmes , David Hurley , Radka Jersakova

分类：机器学习

2022-12-15

Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously assesses state-of-the-art machine learning techniques used to predict COVID-19 infection status based on vocal audio signals, using a dataset collected by the UK Health Security Agency. This dataset includes acoustic recordings and extensive study participant meta-data. We provide guidelines on testing the performance of methods to classify COVID-19 infection status based on acoustic features and we discuss how these can be extended more generally to the development and assessment of predictive methods based on public health datasets.

translated by 谷歌翻译

A large-scale and PCR-referenced vocal audio dataset for COVID-19

Jobie Budd , Kieran Baker , Emma Karoune , Harry Coppock , Selina Patel , Ana Tendero Cañadas , Alexander Titcomb , Richard Payne , David Hurley , Sabrina Egglestone

分类：机器学习

2022-12-15

The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.

translated by 谷歌翻译

A Survey on Computer Vision based Human Analysis in the COVID-19 Era

Fevziye Irem Eyiokur , Alperen Kantarcı , Mustafa Ekrem Erakın , Naser Damer , Ferda Ofli , Muhammad Imran , Janez Križaj , Albert Ali Salah , Alexander Waibel , Vitomir Štruc

分类：计算机视觉

2022-11-07

The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks. Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given.

translated by 谷歌翻译

An End-to-End OCR Framework for Robust Arabic-Handwriting Recognition using a Novel Transformers-based Model and an Innovative 270 Million-Words Multi-Font Corpus of Classical Arabic with Diacritics

Aly Mostafa , Omar Mohamed , Ali Ashraf , Ahmed Elbehery , Salma Jamal , Anas Salah , Amr S. Ghoneim

分类：计算机视觉 | 自然语言处理 | 机器学习

2022-08-20

这项研究是有关阿拉伯历史文档的光学特征识别（OCR）的一系列研究的第二阶段，并研究了不同的建模程序如何与问题相互作用。第一项研究研究了变压器对我们定制的阿拉伯数据集的影响。首次研究的弊端之一是训练数据的规模，由于缺乏资源，我们的3000万张图像中仅15000张图像。另外，我们添加了一个图像增强层，时间和空间优化和后校正层，以帮助该模型预测正确的上下文。值得注意的是，我们提出了一种使用视觉变压器作为编码器的端到端文本识别方法，即BEIT和Vanilla Transformer作为解码器，消除了CNNs以进行特征提取并降低模型的复杂性。实验表明，我们的端到端模型优于卷积骨架。该模型的CER为4.46％。

translated by 谷歌翻译

A Neuromorphic Vision-Based Measurement for Robust Relative Localization in Future Space Exploration Missions

Mohammed Salah , Mohammed Chehadah , Muhammed Humais , Mohammed Wahbah , Abdulla Ayyad , Rana Azzam , Lakmal Senevirante , Yahya Zweiri

分类：计算机视觉

2022-06-23

太空探索目睹了毅力漫游者登陆火星表面，并展示了火星直升机超越地球以外的第一次飞行。在他们在火星上的任务中，毅力漫游者和Ingenuity合作探索了火星表面，Ingenuity侦察员地形信息为Rover的安全穿越。因此，确定两个平台之间的相对姿势对于此任务的成功至关重要。在这种必要性的驱动下，这项工作提出了基于基于神经形态视觉测量（NVBM）和惯性测量的融合的强大相对定位系统。神经形态视觉的出现引发了计算机视觉社区的范式转变，这是由于其独特的工作原理由现场发生的光强度变化触发的异步事件所划定。这意味着由于照明不变性而无法在静态场景中获取观察结果。为了规避这一限制，在场景中插入了高频活动地标，以确保一致的事件射击。这些地标被用作促进相对定位的显着特征。开发了一种新型的基于事件的地标识别算法，使用高斯混合模型（GMM），用于匹配我们NVBM的地标对应。 NVBM与提议的状态估计器中的惯性测量，地标跟踪Kalman滤波器（LTKF）和翻译解耦的Kalman Filter（TDKF）分别用于地标跟踪和相对定位。该系统在各种实验中进行了测试，并且在准确性和范围方面具有优于最先进的方法。

translated by 谷歌翻译

Going Deeper than Tracking: a Survey of Computer-Vision Based Recognition of Animal Pain and Affective States

Sofia Broomé , Marcelo Feighelstein , Anna Zamansky , Gabriel Carreira Lencioni , Pia Haubro Andersen , Francisca Pessanha , Marwa Mahmoud , Hedvig Kjellström , Albert Ali Salah

分类：计算机视觉

2022-06-16

动物运动跟踪和姿势识别的进步一直是动物行为研究的游戏规则改变者。最近，越来越多的作品比跟踪“更深”，并解决了对动物内部状态（例如情绪和痛苦）的自动认识，目的是改善动物福利，这使得这是对该领域进行系统化的及时时刻。本文对基于计算机的识别情感状态和动物的疼痛的研究进行了全面调查，并涉及面部行为和身体行为分析。我们总结了迄今为止在这个主题中所付出的努力 - 对它们进行分类，从不同的维度进行分类，突出挑战和研究差距，并提供最佳实践建议，以推进该领域以及一些未来的研究方向。

translated by 谷歌翻译

A Unified Framework for Adversarial Attack and Defense in Constrained Feature Space

Thibault Simonetto , Salijona Dyrmishi , Salah Ghamizi , Maxime Cordy , Yves Le Traon

分类：人工智能 | 机器学习

2021-12-02

可行对抗示例的产生对于适当评估适用于受约束特征空间的模型是必要的。但是，它仍然是一个具有挑战性的任务，以强制执行用于计算机愿景的攻击。我们提出了一个统一的框架，以产生满足给定域约束的可行的对抗性示例。我们的框架支持文献中报告的使用情况，可以处理线性和非线性约束。我们将框架实例化为两种算法：基于梯度的攻击，引入损耗函数中的约束，以最大化，以及旨在错误分类，扰动最小化和约束满足的多目标搜索算法。我们展示我们的方法在不同域的两个数据集上有效，成功率高达100％，其中最先进的攻击无法生成单个可行的示例。除了对抗性再培训之外，我们还提出引入工程化的非凸起约束，以改善模型对抗性鲁棒性。我们证明这一新防御与对抗性再次一样有效。我们的框架构成了对受约束的对抗性攻击研究的起点，并提供了未来的研究可以利用的相关基线和数据集。

translated by 谷歌翻译

Benchmarking Quality-Dependent and Cost-Sensitive Score-Level Multimodal Biometric Fusion Algorithms

Norman Poh , Thirimachos Bourlai , Josef Kittler , Lorene Allano , Fernando Alonso-Fernandez , Onkar Ambekar , John Baker , Bernadette Dorizzi , Omolara Fatukasi , Julian Fierrez

分类：计算机视觉

2021-11-17

通过生物手段自动验证一个人的身份是在每天的日常活动，如在机场访问银行服务和安全控制的一个重要应用。为了提高系统的可靠性，通常使用几个生物识别设备。这种组合系统被称为多模式生物测定系统。本文报道生物安全DS2（访问控制）评估由英国萨里大学举办的活动，包括面部，指纹和虹膜的个人认证生物特征的框架内进行基准研究，在媒体针对物理访问控制中的应用-size建立一些500人。虽然多峰生物测定是公调查对象，不存在基准融合算法的比较。朝着这个目标努力，我们设计了两组实验：质量依赖性和成本敏感的评估。质量依赖性评价旨在评估融合算法如何可以在变化的原始图像的质量主要是由于设备的变化来执行。在对成本敏感的评价，另一方面，研究了一种融合算法可以如何执行给定的受限的计算和在软件和硬件故障的存在，从而导致错误，例如失败到获取和失败到匹配。由于多个捕捉设备可用，融合算法应该能够处理这种非理想但仍然真实的场景。在这两种评价中，各融合算法被提供有从每个生物统计比较子系统以及两个模板和查询数据的质量度量得分。在活动的号召的响应证明是非常令人鼓舞的，与提交22个融合系统。据我们所知，这是第一次尝试基准品质为基础多模态融合算法。

translated by 谷歌翻译